Method of dividing structured documents into several parts

ABSTRACT

The method applies to a structured document (D) presenting a hierarchical structure defined by a structure schema, the document combining a main structured set ( 1 ) of information including information subsets ( 1.1, 1.2, 1.3, . . . , 1.2.2.2 ), at least some of the information subsets being structured and being capable of including information subsets of lower hierarchical level, each information subset being associated in the higher level information set with a respective information type (T). The method comprises the steps of: dividing the document into structured portions (P 1,  P 2,  P 3 ) capable of being handled individually, namely a main portion (P 1 ) and at least one secondary portion (P 2,  P 3 ), the main portion containing at least the main set ( 1 ) of information, and the secondary portion containing an information subset ( 1.2.1, 1.2.2 ) which is removed from the main set of information, each secondary portion being attached to the main portion or to another secondary portion; and allocating a predefined value to the information type of each information subset ( 1.2.1, 1.2.2 ) that has been removed from an information set ( 1.2 ) of higher hierarchical level.

The present invention relates to a method enabling structured documentsto be divided into several parts.

It applies particularly but not exclusively to handling, transmitting,storing, and reading structured multimedia documents, digital or videoimages or image sequences, movies or video programs, and more generallyto any transfer of said documents between processor units interconnectedby data transmission networks, or between a processor unit and a storageunit, or indeed between a processor unit and a playback unit such as atelevision set if the document is a video program.

More and more frequently, documents handled and transmitted in this waycontain a plurality of different types of data integrated in astructure. A structured document is a connection of data sets eachassociated with a type and attributes, and interconnected byrelationships that are mainly hierarchical. Such documents use a markuplanguage such as Standard Generalized Markup Language (SGML), HypertextMarkup Language (HTML), or Extensible Markup Language (XML), serving inparticular to distinguish between the various subsets of informationmaking up the document. In contrast, in a “linear” document, the contentinformation of the document is mixed in with layout information and typeinformation.

A structured document includes markers for separating different sets ofinformation in the document. For SGML, XML, or HTML formats, thesemarkers are referred to as “tags” and have the form “<XXXX>” and“</XXXX>”, the first marker marking the beginning of a set ofinformation called “XXXX”, and the second marking the end of said set. Aset of information may itself be made up of a plurality of lower-levelsets of information. Thus, a structured document presents a tree orhierarchical structure schema, each node representing a set ofinformation and being connected to a node at a higher hierarchical levelrepresenting a set of information that contains the sets of informationat lower level. The nodes situated at the ends of branches in such atree structure represent sets of information containing data ofpredetermined type, themselves not suitable for being resolved intosubsets of information.

Thus, a structured document contains separation markers represented intextual or binary data form, said markers defining information sets orsubsets that can themselves contain other subsets of information definedby the markers.

A structured document is associated with a structure schema defining thestructure in the form of rules together with the type of information ineach set of information of the document. A schema is constituted bynested groups of information set structures, these groups possibly beingordered sequences, groups of alternative elements, or groups ofnecessary elements, ordered or not ordered.

At present, when a structured document is to be transmitted, it isinitially compressed so as to minimize the volume of data to betransmitted. For best efficiency in such compression processing, thedocument structuring data is also compressed, given that the recipientof the document is assumed to know beforehand the structure schema ofthe document and to be able to use the structure schema to determine atall times what information set is about to be received. It is thereforeessential for the structure of the document as transmitted to correspondexactly to the structure schema that the recipient of the documentintends to use for receiving and decoding the document, since otherwisethe recipient cannot determine the type of data that has beentransmitted and is thus in-capable of decoding the data and ofreconstituting the original document.

Unfortunately, structured documents for transmission are tending tobecome more and more voluminous. Proposals have been made, for example,to transmit or broadcast complete descriptions of movies or TV programsin this way.

In this context, if a transmission error should occur while a documentis being transmitted, the recipient of the document may no longer beable to determine which subset is being transmitted, in which case theentire document needs to be transmitted again. Furthermore, if it isdesired to transmit a movie sequence and display it simultaneously on ascreen, it can be necessary to comply with periods of time fortransmitting the various elements of the sequence. Certain elements ofthe sequence must also be capable of being transmitted several timesover so as to enable a recipient who was not connected at the beginningof the transmission of the sequence to receive and display the end ofthe sequence.

It may also be necessary to replace a portion of a document by another,these two portions having the same structure schema.

The solution which consists in retransmitting the entire document leadsto a considerable increase in the volume of information that needs to betransmitted. It is therefore desirable to be able to divide a documentinto a plurality of portions which are transmitted separately. It turnsout that present transmission methods are not suitable for transmittinga document in part only.

An object of the invention is to overcome that drawback. This object isachieved by providing a method of dividing a structured documentpresenting a hierarchical structure defined by a structure schema, thedocument combining a main set of information including informationsubsets, at least some of the information subsets being capable ofincluding information subsets of lower hierarchical level, eachinformation subset being associated with a respective information type.

According to the invention, the method comprises the steps of:

dividing the document into portions that can be handled separately,namely a main portion and at least one secondary portion, the mainportion containing at least the main set of information, and thesecondary portion containing an information subset which is removed fromthe main set of information, each secondary portion being attached tothe main portion or to another secondary portion; and

allocating a predefined value to the information type of eachinformation subset that has been removed from a higher level informationset.

In this way, each portion is understandable on its own and can bedecoded regardless of the selected partitioning. In addition, when sucha portion is transmitted and the transmission fails, the remainder ofthe document remains valid and only the portion that was not transmittedcorrectly needs to be retransmitted, there being no need to retransmitthe entire document. Furthermore, there is no need to have main portionsand secondary portions upstream from a portion in order to be able todecode that portion, since each portion is valid and comprehensible onits own. By means of these dispositions, a transmitted document can beenriched and modified as time progresses.

Advantageously, the document includes a header which is inserted in eachportion, the header including a flag whose value specifies whether ornot the document is complete.

According to a feature of the invention, each portion has a headercontaining information giving the location of the portion in thehierarchical structure of the document.

Said information concerning the location of the secondary portion in thehierarchical structure of the document advantageously describes a pathin said structure, defining the position of the secondary portion in thedocument.

Said path may be defined in absolute manner relative to the main set ofinformation of the document. It may also be defined in relative mannerrelative to the position of a most recently-transmitted secondaryportion.

Alternatively, each type of information allocated to the predefinedvalue is followed by a reference to the secondary portion containing thesubset of information associated with the type of information, saidinformation concerning the location of the secondary portion in thehierarchical structure of the document being the reference of saidsecondary portion.

The method may also include transmitting a plurality of documentportions associated with the same location in the structure. Under suchcircumstances, the most recently-transmitted portion replaces theprevious portion that was associated with the same location.

Provision may also be made for the header of each portion to containinformation specifying a way of processing the portion relative to aportion associated with the same location in the structure.

The structured document may be of the SGML, XML, or HTML type, forexample.

A preferred embodiment of the invention is described below by way ofnon-limiting examples and with reference to the accompanying drawing, inwhich:

FIG. 1 shows a tree structure in which each node symbolizes a set or asubset of information in a structured document which is normallytransmitted as a single entity;

FIG. 2 shows the structured document of FIG. 1 partitioned into aplurality of portions, each capable of being transmitted separately inaccordance with the invention;

FIG. 3 shows in greater detail the structure of the informationcontained in a structured document; and

FIG. 4 shows another tree structure illustrating a method of definingthe position of a portion of the structure, said portion beingtransmitted separately from the remainder of the structure.

FIG. 1 shows a tree structure comprising a root node 1 partitioned intothree lower level nodes, of which a first node 1.1 is not partitionedinto lower level nodes, a second node 1.2 comprises two nodes 1.2.1 and1.2.2, and a third node 1.3 comprises a single node 1.3.1. The two nodes1.2.1 and 1.2.2 of the second node 1.2 are respectively attached to one1.2.1.1 and to two nodes 1.2.2.1 and 1.2.2.2 of lower level.

This structure represents a structured document D comprising a header Hin which a certain number of parameters are defined that define thecoding and display format of the document, and a main body B containingthe information and the sets of information constituting the document.

According to the invention, a structured document can be transmitted asa plurality of separate portions P1, P2, P3, i.e. a main portion, andsecondary portions P2 and P3 which are attached to the main portion(FIG. 2). Such transmission is preferably performed after each portionfor separate transmission has been compressed in appropriate manner.Each portion of the document, whether or not it is compressed, comprisesa header H, H2, H3, and a main body B1, B2, B3.

As shown in FIG. 3, a main body B of the document comprises a dataheader DH and one or more data bodies DB each containing the informationof an information subset of the document. The data header DH may have afield K enabling ambiguity to be resolved at the time the document isdecoded, in particular by giving a number enabling the following dataset to be defined, and/or a field containing the number N of occurrencesof the data body DB.

Depending on the format used, each data body DB may comprise a field Tspecifying the type of information it contains, a field L giving lengthof the information as a number of bits or of bytes, a field A containingthe attributes of the information subsets, and a field Val containingthe value or the content of the information subsets.

Since the document is structured in the form of a tree structure, thefield Val may itself contain a data header field DH and one or morefields containing a data body DB.

On this topic, it should be observed that in the structure schema shownin FIG. 1, the information contained in the document is held in thenodes 1.1, 1.2.1.1, 1.2.2.1, 1.2.2.2, and 1.3.1 situated at the ends ofthe branches, and also in the attribute fields A of the subsetssymbolized by all of the nodes of the document.

According to the invention, when it is desired to transmit a part ofsuch a document, and regardless of whether it has been previously beencompressed, the field T containing the type of the information in a databody DB that has not been transmitted or that has been withdrawn fromthe document receives a predefined value specifying that the followinginformation subset is not transmitted. This predefined particular valuefor information type is selected to be equal to zero, for example, whena document is in compressed form, with other types of information havingvalues that are not zero.

If this predefined value appears in the transmitted document, the lengthfield L and the fields A and Val which normally follow the informationtype do not appear in the transmitted data. Consequently, following aninformation type that is equal to the predefined value, there is theheader DH of the next set of data in the document, or an end-of-documentflag.

Provision can be made to add a parameter to the document header H tospecify whether or not the document is transmitted in full, so as toinform the recipient of the document whether the document that is beingreceived is being transmitted in full or in part.

The portions P1, P2, and P3 may be transmitted separately one or moretimes. For this purpose, each has a header H, H2, H3 comprising firstlya parameter specifying that the document is not complete, followed by adefinition of the location of the transmitted portion in the treestructure of the complete document.

In this way, a structured document can be enriched and modified overtime.

It should be observed that there is no need to transmit the main portionPI since the location definitions appearing in the headers of thesecondary portions enable the processor unit which receives thetransmitted secondary portions to determine the location of eachreceived portion in the structure of the document and thus to decode it.In addition, the document can be partitioned in such a manner that themain portion does not contain any payload data, so that the entiredocument can be reconstituted from the secondary portions and theirlocations within the document structure.

In addition, the headers H, H2, H3 of the portions P1, P2, P3 maycontain information specifying a mode of processing the portion relativeto an already transmitted portion associated with the same location inthe structure, for example whether the transmitted portion is to replacean already transmitted portion associated with the same location, orwhether it should not be taken into account if it already appears in thereceived document, or indeed whether it should be merged with thealready transmitted portion associated with the same location.

As shown in FIG. 4, this definition of location may comprise the namesof all of the higher nodes going back to the root node R, possiblyassociated with an order number relative to the higher node. Forexample, the firstly node of the first node of the third node of thefirst node attached to the root node (identified in FIG. 4 by a sequenceof arrows coming from the root node R) can be referenced as follows:

/c/a[last]/b(1)d

This notation indicates that it is a node of type “d” connected to thefirst node of type “b” connected to the last node of type “a” connectedto the node of type “c” which is directly connected to the root node R.

Other portions of the document can then be transmitted either by usingthe absolute definition method (relative to the root node R) asdescribed above, or else, and advantageously, by using a relativedefinition method. Thus, for example, the third node connected to thesame node inmmediately above the preceding node may be referenced asfollows:

../e[2]

This notation states that reference is being made to the second node,which must be of type “e”, that is connected to the same node atimmediately higher level as referenced by the notation “../” It can beseen that this second method is more compact than the first.

Alternatively, the location of the transmitted portion P2, P3 of thedocument may be defined merely by means of a reference to the documentportion, said reference having already been transmitted in the mainportion of P1 of the document, e.g. following the predefined valuespecifying that the following information subset is not transmitted.

Preferably, the document, or the portions P1, P2, P3 of the document fortransmission is/are previously compressed. For this purpose, it isadvantageous in each document portion to distinguish between structureinformation and content information, given that certain documentportions need not contain any content information. Thus, in the exampleof FIGS. 2 and 3, the structure information is constituted by all of thefields except for the value fields Val when these fields are notstructured, i.e. when they are not capable of being partitioned intostructured subsets of information. In the example of FIG. 2, these arethe fields Val of the information subsets 1.1, 1.2.1.1, 1.2.2.1,1.2.2.2, and 1.3.1, situated at the bottom ends of the branches of thedocument tree structure.

Compression processing proper consists, for example, in reading theportion of the document that is to be compressed sequentially, inapplying an appropriate compression algorithm for processing thestructure information, and in applying a compression algorithm adaptedto the information type when a non-partitionable field Val appears whilereading the document portion. It should be observed that in a compresseddocument or document portion, the structure information and the contentinformation appears in the same order as in the original, non-compresseddocument.

It is also possible to apply a statistical compression algorithm, suchas Zip.

1-12. (canceled)
 13. A method of handling at least one structureddocument having a hierarchical structure defined in a structure schema,the structured document comprising a main structured set of informationincluding information subsets, at least one of the information subsetsbeing structured and including information subsets of lower hierarchicallevel, each information subset being associated in a higher levelinformation set with a respective information type, the structurecorresponding to each information type being defined in the structureschema, the structured document being divided into structured portionscapable of being handled individually, namely a main portion and atleast one secondary portion, the main portion containing at least a mainset of information, and the second portion containing an informationsubset which is removed from the main set of information, each secondaryportion being attached to the main portion or to another secondaryportion, the structured document comprising in each information set fromwhich at least one information subset has been removed, the informationtype of each removed information subset having a predefined allocatedvalue, said method comprising the steps of: receiving by a recipient adata stream formed by at least one secondary portion, reading by therecipient at least some received secondary portions, the step of readingcomprising at least one step of updating over time the plurality ofsecondary portions associated with a same location in the structure inaccordance with at least one predefined rule.
 14. The method accordingto claim 13, wherein during the step of updating, the most recentlyreceived secondary portion replaces the previously received secondaryportion associated with the same location in the structure.
 15. Themethod according to claim 13, wherein a header of each read secondaryportion contains information specifying a processing mode to be appliedto said secondary portion relative to an already received secondaryportion associated with the same location in the structure.
 16. Themethod according to claim 13, wherein the structured schema of thestructured document is known by the recipient.
 17. The method accordingto claim 13, wherein the structured document is partitioned in such amanner that the main portion does not contain any payload data, so thatthe entire document is reconstituted from the secondary portions andtheir locations within the document structure.
 18. The method accordingto claim 13, wherein the data stream comprises the main portion,the-step of reading comprising at least one step of updating over time aplurality of main portions associated with a same location in thestructure in accordance with at least one predefined rule.